Skip to content

Conversation

@georgeglarson
Copy link

Summary

Expands Venice.ai model selection from 5 to 13 models, providing more options for different use cases.

Changes

  • Add Llama 3.1 models (405B, 70B, 8B)
  • Add Deepseek Coder V2 for coding tasks
  • Add Qwen 32B and 72B models
  • Add Mistral Nemo
  • Add Hermes 3 405B

Motivation

Venice.ai offers a wide range of models beyond the initial 5 included in Catwalk. This PR adds 8 additional popular models that are commonly used for:

  • Coding: Deepseek Coder V2
  • Reasoning: Llama 3.1 405B, Hermes 3 405B
  • Cost-effective inference: Llama 3.1 8B, Qwen 32B
  • General purpose: Llama 3.1 70B, Qwen 72B, Mistral Nemo

Testing

Tested with VeniceCode (Venice.ai-optimized fork of Crush) to ensure all models work correctly with the OpenAI-compatible API.

Related

- Add Llama 3.1 models (405B, 70B, 8B)
- Add Deepseek Coder V2 for coding tasks
- Add Qwen 32B and 72B models
- Add Mistral Nemo
- Add Hermes 3 405B

Expands Venice.ai model selection from 5 to 13 models, providing
more options for different use cases including coding, reasoning,
and cost-effective inference.
@georgeglarson georgeglarson requested a review from a team as a code owner November 24, 2025 19:31
@georgeglarson georgeglarson requested review from Copilot, meowgorithm and tauraamui and removed request for a team November 24, 2025 19:31
Copilot finished reviewing on behalf of georgeglarson November 24, 2025 19:32
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands the Venice.ai provider model selection from 5 to 13 models, offering users more options for different use cases including coding, reasoning, and cost-effective inference.

  • Adds 8 new models: Llama 3.1 variants (405B, 70B, 8B), Deepseek Coder V2, Qwen 32B and 72B, Mistral Nemo, and Hermes 3 405B
  • Updates default_small_model_id from mistral-31-24b to llama-3.2-3b
  • Adjusts Llama 3.2 3B pricing to be more cost-effective

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +85 to +86
"cost_per_1m_in": 0.05,
"cost_per_1m_out": 0.05,
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pricing for Llama 3.2 3B has been reduced from 0.15/0.6 to 0.05/0.05. This represents a 3x reduction in input costs and 12x reduction in output costs. Verify this significant pricing change is accurate with Venice.ai's current pricing, as such a substantial decrease could impact cost calculations for users.

Suggested change
"cost_per_1m_in": 0.05,
"cost_per_1m_out": 0.05,
"cost_per_1m_in": 0.15,
"cost_per_1m_out": 0.6,

Copilot uses AI. Check for mistakes.
Comment on lines +71 to +79
"id": "llama-3.1-8b",
"name": "Llama 3.1 8B",
"cost_per_1m_in": 0.1,
"cost_per_1m_out": 0.1,
"cost_per_1m_in_cached": 0,
"cost_per_1m_out_cached": 0,
"context_window": 128000,
"default_max_tokens": 4096,
"can_reason": true,
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All newly added Llama 3.1 models (8B, 70B, 405B) have 'can_reason' set to true, but the existing Llama 3.2 3B and Llama 3.3 70B models have 'can_reason' set to false. This inconsistency is unclear - verify whether the reasoning capability designation is correct across all Llama model versions, as Llama 3.1 and 3.2 are similar model families.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant